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Abstract 

This paper shows that the logarithm of the number of solutions of a random planted 
fc-SAT formula concentrates around a deterministic n-independent threshold. Specifically, 
if F^{a,n) is a random fc-SAT formula on n variables, with clause density a and with a 
uniformly drawn planted solution, there exists a function such that, besides for some 
a in a set of Lesbegue measure zero, we have ^ \ogZ{F^{a,n)) —)■ in probability, 

where Z{F) is the number of solutions of the formula F. This settles a problem left 
open in Abbe-Montanari RANDOM 2013, where the concentration is obtained only for 
the expected logarithm over the clause distribution. The result is also extended to a 
more general class of random planted CSPs; in particular, it is shown that the number of 
pre-images for the Goldreich one-way function model concentrates for some choices of the 
predicates. 
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1 Introduction 


This paper investigates concentration phenomena for the number of solutions in random 
planted random constraint satisfaction problems (CSPs) and the Goldreich one-way function 
candidate. 

A large body of works have studied phase transition phenomena for satisfiability in 
random CSPs. For uniform^ models, the probability of being satisfiable often tends to 
a step function as n tends to infinity, jumping from 1 to 0 when the constraint density 
crosses a critical threshold. For random /c-XORSAT the existence of such a critical threshold 
is proved [21,23,25,46]. For random 2-SAT, the threshold is proved in [17,22,33]. For 
random /c-SAT, k > 3, the existence of an n-dependent threshold is proved in [31], and 
the satisfiability threshold conjecture states that this threshold is n-independent for all k. 
Recently, the conjecture was settled for k large enough [24], while upper and lower bounds are 
known to match up to a term that is of relative order k2~^ as k increases [11,18]. Moreover, 
phase transition phenomena were also studied for a broad family of other CSPs, see for 
example [10,11,44] and references therein. 

The counting problem for random formulas has also received attention recently. In [4], 
a concentration result is obtained for the number of solutions: at a fixed clause density a, 
the number of solutions of a random 2-SAT formula concentrates in the logarithmic scale 
to a deterministic n-independent threshold for almost every a. This result is extended for 
k > 3 for all clause densities having an UNSAT probability decaying fast enough (with a 
mild logarithmic decay being enough), which is conjectured to take place up to the SAT 
threshold. This result is obtained in two parts. First, as was shown earlier in [2,6], the 
property that a random fe-SAT formula has a number of solution bounded by 2"''^, for a 
fixed (j), has a phase transition with an n-dependent threshold, proved a la Friedgut. This is 
then turned into a concentration result for the number of solutions in [4] by showing that 
the limit for ^Elog(l -|- Z{F)) exists, where Z{F) is the number of solutions of the random 
formula. Observe that this gives an n-independent threshold for the concentration. The key 
tool in establishing this limit is the interpolation method, first introduced in [37] for the 
Sherrington-Kirkpatrick model, and subsequently generalized and extended in [4,15,28,29,45]. 
Note however that the use of “U-” in the logarithm above (to obtain a well defined quantity) 
is responsible for the difficulty in obtaining the concentration for all clause densities when 
k>3. 

In this paper, we consider CSPs that have a planted solution and study the counting 
problem for random ensembles. Planted CSPs are a rich ground for studying combinatorial 
optimization problems motivated by ‘real-world’ applications, such as in coding theory, 
community detection, or cryptography, where a solution typically does exist but where the 
problem is to identify how many other solutions are there, or how hard is it to recover 
the planted solution. Planted ensembles were investigated in [5,8,9,14,38,39], and at high 
density in [12,19,27], and relationships between planted random CSPs and their non-planted 
counterparts in the satisfiable phase were studied in [5,42,48]. 

It was shown recently in [3] that for a broad class of random planted CSPs, the logarithm 
of the number of solution concentrates to an n-independent deterministic threshold for almost 
every clause density. In particular, this covers /s-SAT for all fc’s. Hence, the planting allows 
to circumvent the issues of establishing the limit of the log-partition function, since the latter 
is well defined due to the planting (no need for the “U-” term discussed above). However, the 
planting also introduces asymmetry in the model, which lead [3] to a weaker concentration 
result: the concentration is obtained with respect to the graph ensemble but is taken in 
expectation over the clause distribution. 

^The model may have a fixed but uniform number of constraints, or a Binomial or equivalent form. 
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Let us explain this nuance more precisely for random /c-SAT. A random planted formula 
is defined in this case by drawing first a random uniform solution x^, and independently, a 
random 3-hypergraph G = ([n],-E) at a fixed edge density. The random clauses are then 
defined for each edge e G E[G) by drawing a negation pattern Se uniformly at random within 
the set of negations patterns that preserve as a planted solution. Specifically, the clause 
for edge e is dehned by y[e\ 7 ^ Se (where y[e\ is an assignment of literals to the variables 
associated with e). Note that this is indeed equivalent to requiring that the OR of the 
variables in y\e] negated with the pattern Xg is 1. Consider now 

cPn :=-log(Z(F( 0 ))), 
n 

where is the random planted formula. In [1], it is shown that Fs(pn, the expectation 
of (jin taken over the variables s = {sejesElG)) concentrates in probability (with respect 
to the drawing of G) to a deterministic n-independent value. ^ It was left open to obtain 
concentration with respect to the drawing of s as well. In particular, the martingale argument 
used in [ 1 ] fails in this case, since the fluctuations are not bounded, and the application of 
Friedgut’s theorem is mitigated by the lack of symmetry caused by the planting. 

We resolve in this paper the above problem left open in [1] and show that for almost every 
a, there exists an n-independent value <^(a) such that 

-log(Z(F(o)))^ 0 («) in probability, 
n 

closing the concentration problem. The main tool is based on Bourgain’s result from the 
appendix of [31]. The result is then generalized to a broad class of planted CSPs, and a new 
a application to Goldreich’s one way function [36] is investigated. 

The Goldreich one-way function candidate is defined from a A:-hypergraph G on n vertices 
and m hyperedges and a fixed predicate function y : {0,1}^ —)• {0,1}. The function takes an 
input x G { 0 , 1 }"' and, evaluating x at each of the m fc-tuples selected by the hyperedges 
of G, produces an output in {0,1}™. In [36], G is proposed to be drawn at random with 
an edge density m/n, and the choice of predicates is further discussed in [16]. Note that 
both m = u}{n) and m = 0(n) are potential candidates [16,36]. Defining the rate of the 
one-way function by m/n, it is interesting to understand for what rates (in addition to what 
predicates) is the function possibly one-way, in particular, for the case of m/n = a constant. 
A natural approach would be to relate this question to the structure of the solution space of 
the underlying GSP, starting with its size, and hypothetically with the condensation [20,41] 
and freezing of the solution clusters [47] phenomena.^ In particular, the function (/>{■) is 
expected to have a kink at the condensation threshold for various GSPs [43], and this may 
indicate a behavioral changes for the hardness of the one-way function. In this paper, we 
investigate the most basic question towards such considerations: does the function (p even 
exist? Namely, does the normalized logarithm of the number of pre-images concentrates for 
some/all predicates? 

We answer this question by the affirmative for a certain class of predicates. Interestingly, it 
is not obvious that this class of predicates overlaps with the class of predicates that precludes 
the non-hardness conditions introduced in [16] for large clause densities. We hence leave an 
open problem: can one obtain concentration and hardness at the same time, or is hardness 
related to the non-concentration? We believe that the former is true and that our proof 

^Note that the variables s = {se}eeB{G) depend on the planted assignment. For a deterministic kernel Q, 
this means in expectation over the planted assignment. 

^In the non-planted models, these phenomena have been associated with computational barriers for 
satisfiability. 
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technique stumbles on technicalities, but we cannot resolve this argument. 

2 Models 

2.1 CSPs arising from satisfiability problems 

We first describe a class of constraint satisfaction problems. Let V = {ui,..., Vn} be a set of 
Boolean variables, and fix an integer k > 2. An instance P of a CSP consists of a fe-uniform 
multi-hypergraph {V,E) (that is, all edges have cardinality k and we allow parallel edges), 
and a family of clause functions Xe '■ {0,1}^ —)• {0,1} for each e ^ E. A k-clause comprises 
an edge e and its corresponding function Xe- Well sometimes call F a formula. The form of 
the clause function depends on the type of satisfiability problem we are interested in (for the 
moment, SAT, NAESAT or XORSAT). Let y[V] denote an assignment yi,... ,yn of Boolean 
values to the variables in V, and y[e] its restriction to the k variables in e. By Xe{y[&\) we 
mean the result of evaluating Xe on the k values in some fixed order (for the moment, the 
actual choice of ordering of variables in edges isn’t important but it will be when we consider 
certain planted models in Section 4.2.) This model naturally captures familiar satisfiability 
problems: 

• in A:-SAT, we have Xe{y[^]) = 1 y[&\ 7^ Xe where Xe G {0,1}^ represents a 

particular forbidden pattern, 

• in /c-NAESAT, we have Xe{y[e]) = 1 y[e] ^ {xe-,xA} where Xe is the result of 

flipping each component of Xe, 

• in fc-XORSAT, we have Xe{y[&\) = 1 ©il/i = Xe where now Xe G {0,1}. 

An assignment which satisfies all clauses in F is called a satisfying assignment (or solution) 
for F. Let Ck{n) be the set of all possible A:-clauses on V and write N = |C'A;(n)|; in /c-SAT 
for example we have N = {^)‘2^- We use the binomial model for a random CSP with clause 
density a G [0, N/n], and draw a random formula G as follows: ^ 

• include in G each clause in Gk{n) with probability p = an/N. (1) 

Let G{n, a) denote a formula obtained by this process. The formula G{n, a) can be viewed 
as a random element of {0,1}'^, drawn according to the product measure pp. ^ That is, for 
each X G {0,1}^ we have //p(x) := P[G(n, a) = x] = pl’^l(l — (where |x| denotes the 

number of nonzero components). Now consider a procedure to sample a planted CSP F. 

• Sample G {0, !}"■ uniformly at random. (2) 

• Then include in F each fe-clause which is satisfied by independently with probability 
p = an/N. 

We use the notation F{n, a) to denote a formula obtained by this process. By construction 
such a formula is always satisfied by the assignment Vi = v/', the vector is known as the 
planted solution. Let Z{F) denote the cardinality of the set of assignments to vi,... ,Vn 
which satisfy F. Notice that we always have Z(F{n,a) > 1 by construction. Again we 

■’^One could also consider the uniform model, wherein G(n, a) is chosen uniformly from those vectors 
X € {0,1}^ with | 2 ;| = an, where an/N = p. The models are essentially equivalent, and we mostly focus here 
on the binomial model. 

®To see the correspondance, identify each of the N components with a clause and set it to 1 if and only if 
the clause is present in the formula. 


3 



view F{n, a) as an element of {0,1}'^ but observe that the distribution in this case satisfies 
IJ.p{F) = F[F{n,a) = F] = -i)-FI go ig not a product measure here. 

2.2 CSPs arising from Goldreich’s one-way function candidate 

The CSPs introduced in the previous section have clause functions taking a few specific 
forms. In these examples a satisfying variable assignment y[V] satishes Xe{y[e]) = 1, Ve, and 
the clause functions on individual edges are independent of one another. Our concentration 
results can be extended to a related class of CSPs which are related to Goldreich’s proposed 
one-way function [34]. The idea is that we can consider CSPs with arbitrary clause functions 
if the clauses on different edges are related in a specific way. 

In [34] Goldreich proposed a candidate one-way function family which exploits the difficulty 
of recovering a solution to a form of planted CSP. ® Goldreich’s original proposition was that 
the following function / is one-way. As always we work with the variable set V = {ui,..., Vn}- 

• Select a predicate x ■ {Oj 1}^ {0; 1} uniformly at random from the set of all such 

Boolean functions. 

• Draw a sparse Erdos-Renyi /c-uniform multi-hypergraph {V, E) with m edges ei,..., e^. 

• / : {0, ^ {0,1}™’ is the function with f{x)i = x(a:[ej]), i.e. the fth output bit is the 

result of evaluating (p on the k (ordered) values assigned to the edge Cj. 

More precisely, Goldreich conjectured that / is one-way in the setting where k = O(logn) 
and m = n, and the graph is a sufficiently good expander, for most choices of the predicate x 
which is randomly selected and hard-wired into /. 

With this in mind we can define a class of planted CSPs generated by the following 
procedure. 


• Select a predicate x ■ {Oj 1}*^ {Oj !}• 

• Sample G {0,1}”" uniformly at random. (3) 

• Then include in F each /c-clause of the form e with x(y[e]) = x(u°[e]), 
with probability p = an/N. 

Here the edges are ordered subsets of V, and so N = (J^^kl. 

3 Overview of results 

Recall that for is a CSP formula F (planted or not) we denote by Z(F) the number of 
satisfying assignments for F. If (/> G [0,1] we write Qn{(^, (p) ■= F[Z{F{n, a)) < 2^^]. 

3.1 Concentration of the nnmber of solutions of planted satisfiability CSPs 

Our main result is the following theorem, which states that for hxed a > 0 the logarithm of 
the number of solutions of a random planted formula concentrates, closing the problem left 
open in [4]. Note that this clears the concentration problem in its most general form: the 
exponent of the number of solutions of a random planted SAT formula can be asymptotically 
predicted with an re-independent value and for any k >2 (small or large). The only part that 
could be further generalized is the fact that the result does not hold for a countable set of 

®One-way functions are important objects in cryptography and complexity theory. Intuitively these are 
functions that are computationally easy to evaluate, but hard to invert. For a thorough discussion see [35], [34]. 
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“bad” a’s, but it is unclear whether this is a technicality or not. The formal result reads as 
follows. 

Theorem 1. For every k >2, there exist a countable set F and a function (ps ■ [0, a*] — )> [0, 1] 
such that for every a fiV and every e > 0, 

lim Qn{a,4>s{a) - e) = 0 

n^oo 

lim Qn{a,4>sia) + e) = 1 

n^oo 

In [3], it was shown that this quantity concentrates when the expectation is taken over 
the danse distribution. We use this result in the proof of Theorem 1. 

Theorem 2. [3] For every k > 2, for every a G [0, a*\ the sequence 

'ipnia) ■■= ^E[logZ(F(n,Q;))] 
converges almost surely to a limit (psioi)- 

As an intermediate step toward Theorem 1, we will prove that for fixed cp G [0,1] there 
is a sharp threshold density for the property of having fewer than 2'^^ solntions (we define 
these terms in Section 4.1). First, in Section 4.2 we prove the following n-dependent sharp 
threshold. 

Lemma 3. For every k >2 and for every cp G [0,1) there exists a sequence {an{4>)}neZyo 
such that for every e > 0, 

lim Qniani(p) - e,(p) = 0 

n^oo 

lim Qn{an{(p) + e,(p) = 1. 

In fact, we prove Lemma 3 for a larger class of planted CSPs, namely those which arise 
from Goldreich’s one-way function candidate [34]. This allows ns to dednce the analogous 
statement of Theorem 1 for certain instances of these CSPs, as well as an n-dependent version 
of it in general. 

In Section A we combine Lemma 3 with Theorem 2 using a technique from [4] to show 
that the sequence an{(p) converges. 

Theorem 4. For every k > 2, there exist a countable set C and a function (ps ■ [0, a*) — )• [0, 1] 
such that for each (p G (/>s([0,oo)) there exists as{(p) such that for each e > 0, 

lim Qn{as{<p) - €,(p) = 0 
n^oo 


and 

lim Qn{as{<P) + e,4>) = 1 

n^oo 

We deduce Theorem 1 from Theorem 4 in Section A. 

3.2 Concentration of the number of solutions of CSPs from Goldreich’s 
one-way function candidates 

We now present concentration results for the number of solutions of the CSPs arising from 
Coldreich’s one-way function candidates described in Section 2.2. 

If one considers the logarithm of the number of solutions of the one-way function candidate 
determined by a random graph G, a predicate x a uniform input, and takes the average 
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over the input distribution, it is possible to obtain the following concentration result. Note 
that this gives a stronger concentration notion, i.e., almost sure and for every a, and imposes 
no restriction on the choice of x- However, it provides a n-dependent threshold and requires 
averaging over the input distribution. 

Lemma 5. Let F{n,a) be a formula drawn as in (3) Then for every k > 2, there exist a 
function cjif : [0,a*] —)> [0,1], namely 0" = log Z(F(n, a)), such that for every a > 0 
and every e > 0, the following holds almost surely 

lim (E^o log Z{F{n, a)) - Eg-,„o log Z{F{n, a))) = 0. 

The proof is found in Appendix Section B. 

We can dispose of the dependence of fs on n and on the averaging of the input in the 
previous theorem for certain choices of x- We simply need to remark that Theorem 2 was 
in fact shown in [4] to hold for planted formulas F{n, a) which satisfy a certain convexity 
hypothesis (let’s call it H for now), then the proof of the following theorem follows that of 
Theorem 1 exactly as in Section A. To this end, in the Appendix Section C we prove the 
analogue of Lemma 3 for these CSPs. This allows us to deduce the analogous statement of 
Theorem 1 for certain instances of these CSPs. 

We say that a predicate x ^ {0; 1}^ {0,1} is balanced if it evaluates to 1 on exactly half 

of the inputs and we say x is antisymmetric if xi^) = 1 “ x{x) for some x G {0,1}^. 

Theorem 6. Let F{n, a) be a formula drawn as in 3, with a predicate x which is antisymmetric 
and satisfies Hypothesis H. Then for every k >2, there exist a countable set V and a function 
(fs ■ [0, a*] — )• [0,1] such that for every a ^ V and every e > 0, 

lim Qn{a,4>s{a) - e) = 0 
lim Qn{a,4>s{a) + e) = 1 

n^oo 

The hypothesis H, stated in terms of x is as follows. 

Definition 1. Let MidOA}^) denote the space of probability measures on Let £ > 1. 

Define T Mi({0,1}*’) ^ M by 


xiuA)=---=x(uA) 

Hypothesis H. For each > 1, the operator T is convex in v. 

Bogdanov and Qiao showed in [16] that for many choices of X; Goldreich’s function can 
be inverted with high probability when m is larger than n by a sufficiently large constant 
factor. In particular any x which is not balanced or whose output correlates with one or two 
bits of the input is a bad choice when m = Dn for sufficiently large constant D. Their result 
suggests that if we want the resulting function to be one-way then we may want x to be 
balanced and not correlated with any bit or pair of bits of the input, but it is unclear whether 
these would be necessary in the regime m = n, the one Goldreich originally suggested. 

Strictly speaking, the restriction to antisymmetric x in Theorem 6 does not seem necessary. 
It is a technical condition which arises in the proof of Lemma 3. We have verified using a 
computer search that when fc < 5 no antisymmetric function satisfies the balance properties 
along with Hypothesis H but it remains unclear to us whether such a function can exist in 
general. 
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4 An overview of the proofs 


The main element in our proofs is Lemma 3, whose proof we give in this section. From there, 
obtaining Theorem 1 and its analogues is a straightforward argument given in the Appendix 
Section A. 


4.1 Sharp thresholds and Bourgain’s theorem 


Before proceeding to the proof of Lemma 3, we briefly give a bit of background material 
on sharp thresholds. A subset An C {0,1}'^ is called a property, and we say it is nontrivial 
if An C {0,1}'^. Property An is monotone increasing (or simply monotone) if for every 
X € A and x C y we have y € A. (Containment of formulas is defined in the natural way, 
namely x C y iff every nonzero component of x is also nonzero in y.) We may drop the 
subscript n when it is unambiguous or unnecessary. A property is symmetric if there is a 
transitive permutation group under which it is invariant. For example, in (unplanted) SAT, 
the property of being unsatisfiable is monotone and symmetric. 

In this section and the next it is convenient to make a slight abuse of notation, and 
write F(n,p) in place of F{n,a) to stress that clauses are included in F(n,a) according 
to binomial {p = an/N) distribution. For a monotone property An C {0, 1}^, write 
t^p{An) = YIx^a Fp{x) = IP[T(n,y) G .4^]. It’s not difficult to show that if A is a nontrivial 
property then pip{A) is a strictly increasing and continuous function of p. For 7 G (0,1), let 
Pn{l) be the value which (uniquely) satisfies = 7 . We say that pn is a threshold 

probability if 


lim P[F(n,y) G A] 

n^oo 



if Pn > Pn 
if Pn < Pn 


where the notation pn Pn indicates that — —?■ 0 as n diverges. 

Pn 

We make a distinction between properties exhibiting a very rapid transition versus those 
with a more gradual one. Formally, we say that A has a sharp threshold if for every 7 G (0,1) 
there exists p^ = p^{n) such that ¥[F{n,p^) G A] = 7 , and such that for every <5 > 0, 


lim F[F{n,p) G A] = { 


if p(n) > (1 + 6)p-y{n) 
if p{n) < (1 — 6)p^{n) 


Equivalently, for r G (0,1) define po,pi,Pc such that p{po) = x, fi(pi) = 1 — r and 
p{Pc) = 2 - property A has a sharp threshold if the ratio tends to 0 . The threshold 

is coarse if this ratio is bounded away from 0, i.e. if there exists some constant C such that 
for some 7 G (0,1) we have P 7 \p=P'y < Friedgut and Kalai (see [32]) showed that in 
this case it must be true that p^ = o(l). 

A crucial contribution to the theory of sharp thresholds is due to Friedgut, in the form of 
a general existence theorem for sharp thresholds (see [30], [31]). Roughly, the theorem asserts 
that if a monotone symmetric property has a coarse threshold, then it can be approximated 
by the property of containing a small fixed subgraph. We omit the statement of Friedgut’s 
theorem since it does not apply in our setting; introducing a planted solution does away with 
the symmetry in the properties we are interested in. Fortunately in the appendix to [31], 
Bourgain gave an analogue of Friedgut’s result to nonsymmetric properties as follows. This is 
the theorem we will need to apply. 

Theorem 7 (Bourgain [31]). (See also [ 40 ]) Let An C {0,1}'^ be a monotone property, and 
C > 0 constant. Suppose pLp is the product measure on {0,1}^, i.e. Pp{x) = p\^\(\—p')^~\A jgp 
every x. Assume that there exists 7 G (0,1) such that p,p^{An) = 7 and \p=Pt < ^ 

and p = 0 ( 1 ). Then there exists 5 = h(C) > 0 such that either 
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1. Hp{x G {0,1}" : X contains x' G An of size |x'| < IOC) > 6, or 

2. there exists x' ^ An of size \x'\ < IOC such that the conditional probability satisfies 

lJip{x G An\x' C x) > 7 + 5. 

Friedgut’s theorem and Theorem 7 provide a framework for finding sharp thresholds that 
has been widely exploited. These theorems typically allow one to prove the existence of a 
sharp threshold whose value depends on n, whereas in many cases the threshold is believed to 
converge. Friedgut’s original application was to show that satisfiability for /c-SAT has a sharp 
threshold. He also used the theorem to prove that in hypergraphs, the property of having a 
perfect matching, as well as 2-colourability have sharp thresholds. With Achlioptas in [7] 
they proved that A:-colourability of graphs (for fixed k) has a sharp threshold. Krivelevich 
and Nachmias in [40] showed the same for list-colourability of bipartite graphs. Their proof 
uses a neat combinatorial trick (due to Alon) of combining Theorem 7 with a theorem of 
Erdos and Simonovits. We use a similar approach in the next section. A comprehensive 
survey of applications of Friedgut’s theorem can be found in [30]. 

4.2 An n-dependent sharp threshold for planted CSPs 

Here, we prove Lemma 3. For a fixed (/> > 0, we are interested in the property A,/,(= A^^) = 
{F G {0,1}^; Z(F) < 2'^”}. Clearly, A^ is monotone increasing. We will show that it has a 
sharp (n-dependent) threshold. As before, let A^ = {F £ {0,1}^; Z{F) < 2'^"'}, and now let 
F = F{n,p) denote a CSP obtained as in (2). (We explain how the proof can be adjusted to 
handle F(n,p) as in (3) in the Appendix Section C.) Lemma 3 can be restated as follows. 

Lemma 8. For a fixed k and 4> > 0, the property A^ has a sharp threshold. 

To prove Lemma 8 we will apply Bourgain’s Theorem (Theorem 7). In the distribution of 
F{n,p), we do not have the assumption on fip in the hypothesis of Theorem 7. To overcome 
this difficulty we need to consider fixed plantings, and observe that conditioning on the 
random choice of doesn’t change the probability of the property By total probability, 

¥[F{n,p) £ A^] = ^ ¥[F{n,p) £ A(j)\v^ = v]¥[v^ = v]. 

DSfO,!}" 

Further, for any v £ {0,1}”, the conditional probability satisfies 

F[F{n,p) £ = u] = F[F{n,p) £ A^\v^ = 0”] 

since the number of satisfying assignments is unchanged by swapping a variable with its 
negation. 

Therefore, if we let F^{n,p) denote a formula obtained by independently including each k- 
clause which is satisfied by = 0” with probability p, we have F[F{n,p) £ Afi\ = P[F'^(n,p) G 
Afi\. So to prove Lemma 8 it is enough to show that A^ has a sharp threshold when 0” is the 
planted solution. Now, the space we are working in is {0,1}^ , where N' = (^)(2^ — 1), and 
indeed Hp{F) = pl^l(l — p)^'~^^^. For the remainder of the proof, this will be the assumed 
setting. 

We now proceed to prove the sharp threshold. The idea is to assume for a contradiction 
that A^ has a coarse threshold, and apply Bourgain’s theorem. We closely follow arguments 
found in [40] and [6]. Roughly, Bourgain’s theorem implies the existence of some fixed small 
formula x' whose appearance in a random formula increases the probability of having property 
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A(i, by a positive amount. Note that while not symmetric, is invariant under relabelings 
of the variable set (i.e. automorphisms of {ui,..., Vn, “'Ui,..., ^Vn} which map {ui,..., Vn} 
to itself and -lUj to the negation of the image of vt, for each i). This property is sometimes 
called permutation symmetry. Thus, containing a random (relabeled) copy of x' has the same 
effect on the probability of having A^f,. On the other hand, the assumption that the threshold 
is coarse implies that adding a large number of random clauses does not drastically change 
the probability of belonging to At/). We will see that with the addition of a sufficient number 
of random clauses we can simulate the addition of x'. 

Proof of Theorem 8. Suppose for a contradiction that A^ has a coarse threshold. Then there 
exist 7 , p^y = o(l) and C as in Theorem 7, and so one of the two cases in its conclusion must 
hold. 

Case 1: //p(x G {0,1}"' ; x contains x' G A^ of size \x'\ < IOC) > 5. 

If the size of a formula x' is < IOC, then its clauses involve at most lOCfe variables. Since 
x' G A^, and it is satisfied by , assigning the planted value to the variables appearing in 
x' and arbitrary values to the other variables yields a satisfying assignment. It follows that 
Z{x') > > 2'^" for large enough n, so x' ^ A^. This proves that Case 1 cannot occur. 

Case 2: there exists x' ^ A^f, of size |x'| < IOC such that the conditional probability satisfies 
IJ.p.,{x G A^\x' C x) > 7 + 5. 

Clearly x' is satisfied by u®. Denote by t < lOCk the number of variables appearing in x'. 
Without loss of generality, assume these variables are vi,... ,vt. For a t-tuple v = {vi ^,..., UjJ 
of distinct variables, we write x'{v) to denote the result of relabeling each variable Vj in x' to 
Vi-. Since A^ has permutation symmetry, it follows that for any t-tuple v, the conditional 
probability satisfies fip{x G ^|x(u) C x) > 7 + (5. We write x* to mean the result of taking x{v) 
after drawing a uniformly random t-tuple v. In other words, if a random formula F^{n,py) is 
drawn, the union F^{n,p.y) U x* belongs to A^j, with probability at least 7 + 5. 

Now, since < C it follows that lim£_,.oo +^^7 ^ Up~iA<t>) ^ ^ Thus, 

for some e we have +p.^+£p.^ (^ 0 ) < 7 + 2 - Further, (by a standard two-round exposure 
argument) choosing a formula F^(n,p^ + ^P'y) is equivalent to choosing formulae F^{n,p^) 
and F^{n, s'p-y) for some s' and taking their union. Note that e,£' don’t depend on n, since 
C does not. 

Denote by x* a random copy of x' drawn as above. Then the above tells us that 

F[F^{n,py) U X* G Aff,] >7 + 5 


while 

F[F^{n,p.y) U F°{n,£'p.y) e A^] < 7 + f. 

It follows that for some formula Hq G {0,1}'^ we have 

P[Fo U X* G >1,^] - P[i7o u F0(n, e'p^) G >1,^] > | (4) 

Clearly, F[q ^ A/f,. Let’s say that a t-tuple of distinct variables v = (uq,...,UiJ G 
{ui,... ,VnY is bad if Z{Hq U x(u)) < 2'^"'. It follows that at least a | fraction of all (”)t! 
t-tuples are bad. Let T be the set of bad tuples. We need the following theorem of Erdos 
and Simonovits [26]. 

Theorem 9 (Erdos and Simonovits). Let k,t be positive integers and 0 < 7 < 1. There 
exists 7' > 0 such that for sufficiently large n, if T C [n]* is such that \T\ > 'jn' then with 
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probability at least 7 ' a random choice of t disjoint k-tuples Xi,... Xt from [n] satisfies that 
every t-tuple (xi,..., xt) with Xi G Xi is bad. We say that Xi,... ,Xt is T-complete. 

We will obtain a contradiction from Theorem 9. Basically, we will ensure that with high 
probability, adding F^{n, e'p) to Hq implies adding clauses Ci,..., Cj, where each clause Ci 
forces some variable to be set to its planted value, and the set of fe-tuples of variables in the 
clauses is T-complete. 

Consider drawing t random clauses. Applying Theorem 9 with 7 = | we find some 7 ' 
for which the t /c-clauses are T-complete with probability at least 7 '. Given that they are 
T-complete, the probability that they are each of the form Xe{vii ... / 1^ (in fe-SAT or 

fe-NAESAT case, or of the form Xe{vii ■ ■ - k mod 2 in the fe-XORSAT case) is 2“^*. 

Observe that each clause forces some variable to take the value 0, except in fc-XORSAT when 
k is even and the clause forces some variable to take the value 1. 

We claim that adding t such clauses to Hq yields a formula with < satisfying 

assignments. Indeed, suppose we have a satisfying assignment. Then at least one variable, 
say Ci, from each of the Ct must be set to 0 (1 in the even /c-XORSAT case). This is at least 
as restrictive as containing x((ci,..., ct)), since x(0*) is satisfied (and in the even /s-XORSAT 
case, therefore x(l*) is also satisfied ). But (ci, ■ ■ ■ ,ct) is a bad tuple so there are fewer than 
2'^"' ways to extend these to the remaining variables to get a satisfying assignment for Hq. 

With high probability, F{e'p.y) has 0(e'p-j.(^) (2^ — 1)) — )■ 00 clauses. So if we draw 
F^{n,e'py) the probability that the clauses added don’t include t clauses which force a 0 
variable as above is at most about (1 — 'y'2~^^y Pr(fe )(2 which we can make as small as we 

like as n —)• 00. In particular, we can assume it is smaller than |. In the event that F^(n, e'p-y) 
does include these t clauses Ci,... ,Ct, consider a satisfying assignment of Hq U Ci ... Ct- 
The probability that it satisfies a randomly chosen /c-clause is (1 — 2“^). Therefore, in this 
case the expected value of Z{Hq U F^{n,e'p^)) is at most k^2^{l — < 2‘^"' 

with high probability. Applying Markov’s inequality, we can ensure that with probability 
greater than 1 — the formula F[q U F^{n,e'p^) G A^, contradicting (4). This proves Case 2 
cannot occur and completes the proof of the lemma. 

□ 


5 Open problems 

As mentioned in the introduction, it is not obvious that the conditions on the predicate x 
used to obtain concentration (see Definition 1) are compatible with the conditions ruling out 
easily invertible functions [16]. The bottleneck here seems to be Hypothesis H, which at a 
high-level, translates the sub-additivity of the logarithm of the number of solutions (used 
to obtain concentration) into a local convexity property of the predicate x- If th® convexity 
property were in fact necessary, then this would be in conflict with the choice of predicates 
that seem to avoid the undesirable balanceness properties, making the problem curiously 
tensed between concentration and hardness. It would hence be interesting to show that this 
a limitation of our current proof technique, unless concentration has anything to do with 
hardness. 

Another interesting question would be to obtain convergence rates for the convergence in 
probability. We obtain an exponential rate in Theorem 5 using martingale arguments, but 
this does not apply to our results relying on Bourgain. 

Finally, the results in this paper are about the most basic properties of the solution space, 
namely, it cardinality. It would be interesting to understand rigorously finer properties of the 
solution space for planted models to deduce proper choices of the rate and predicates for the 
Goldreich one-way function. 
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A Freezing the threshold 


In this section we prove Theorem 4. The proof essentially follows arguments in [4], but we 
give it here for completeness. 


Proof of Theorem 4- For a G [0,a*], let f>s{oL) denote the limit of the sequence ifniot) = 
^E[log Z(F’(n, a))] which converges almost surely by Theorem 2. 

Let 4>o = <fs{oio) for some ao- In view of Lemma 3 it is enough to show that the sequence 
otnif'o) obtained there converges (unless 4>o takes one of countably many values). Suppose 
that it does not. Let 

ao = liminf aniffo) 

- n^oo 

and 

(So = limsupa„(())o)- 

n^oo 

By assumption ao and oq disagree. Then we can choose increasing sequences {mi}“i and 
{ni}“i such that 

lim — chq 

1^00 

and 

lim UnMo) = «o- 

i^OQ 

Let ao < a < oq- Then for sufficiently large i there exists e > 0 such that 


Qrui (a, </>o) < Qm, {am, (</>o) - 6, ^o) 0 as z oo 


and 


Qm (a, (po) > Qm {am {(f>o) + e, 0o) 1 as i oo. 

Moreover since a > 0 we have 

Qm, {a, cAo) = IP \z{F{mi, a)) < = P[^ log Z{F{mi, a)) < <fo 


and so we have 


lim E 
2^00 


^logZ(F(mi,a)) 


> 


i.e. 


since the above expectation 'ijjn{a) converges to (i)s{a). A similar argument shows that 


lim E 
2^00 


^logE'(F(ni,a)) < 4>o 


and so 

(f)s{a) < (fo- 

It follows that the function (j)s is constant on (ao,ao)- Since (j)s is non-increasing on [0, oo) 
it follows from Froda’s theorem that there are countably many values 4>o for which an{(l)o) 
does not converge. This completes the proof. □ 

We are now all set to prove our main theorem, which we restate now for convenience. 

Theorem. For every k >2, there exist a countable set V and a function 4>s '■ E>o —)• [0, 1] 
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such that for every a and every e > 0 , 

lim Qn{Q.,<i)s{,Oi) - e) = 0 

n^oo 

lim Qn{Q.,<j)s{oi) - e) = 0 

n^oo 

Proof of Theorem 1. Let (fg be the function obtained in Lemma 3, and let T> denote the (count¬ 
able) set of its discontinuities. Assume a G [0, a*]\'D, and let cfsict) = lim„_>.oo ^E[log Z(F(n, a))] 
as in Theorem 2. 

Lemma 3 implies that for some countable C, the limit A{(f>) = lim^^oo cini4>) exists for 
each (j) G i?is([0, a*]) \ C. Thus, there exists some e' < e such that for 4>* := — e' we have 

Oinif*) converges to a limit A((^*) > a. Therefore, there exists <5 > 0 such that 

Qn{a, (fsia) -e) < Qn{a, (f*) < Qn{an{(j)*) - d, 4>*) 0. 

It follows that Qn{ot,4>s{oi) — e) —>■ 0 as n —>■ oo. A symmetric argument shows that 
Qn{oi, 4>s{oi) -|- e) —)> 1 as n —)> oo. This completes the proof. □ 


B Proof of Theorem 5 


Finally we now give the proof of Theorem 5. 

Proof of Theorem 5. We consider the model of (3) for the Goldreich one-way function candi¬ 
date. Let us denote by X a nniformly drawn input in {0,1}” and by G a random hypergraph 
of fixed density. The output of the Goldreich one-way function candidate is the vector 
Y(X, G) = {x(^[e])}eeE(G)' We denote the number of pre-images of this output by Z{X, G). 
In what follows, we show that the random variable L{G) := ExlogZ(X, G) concentrates 
around its expectation EgL(G), which depends on n. 

For that pnrpose, we show that for any that e > 0, 

Fg{\L{G) - EgL{G)\ > ne} < (5) 


which implies that |L(G)/n —EGL(G)/n| converges almost surely to 0 from the Borel-Gantelli 
Lemma. The above inequality results from a standard application of the Aznma-Hoeffding 
inequality [13], as used in [1] for more general models. Since the graph is Erdos-Renyi, we 
consider equivalently the edges to be drawn uniformly at random (conditioning on the nnmber 
of edges in the graph). We need to show that, if e is an edge picked nniformly at random and 
G U e is the augmented graph, the increment L(G) — L(G U e) is bonnded. In fact. 


L(G) -L(GUe) 


= Ex log 


< log Ex 


Z{X,G) 

Z{X,G\Je) 

Z(X,G) 

Z(X,GUe) 


= -logExEc;|xl(x(^[e]) = x{U[e])) 


( 6 ) 

(7) 

( 8 ) 


where 1/ is a random vector uniformly drawn among all vectors u G {0,1}” such that the 
output of u is the same as the output of X on the one-way function defined by G and x- Note 
that X and U are not independent but exchangeable, i.e., they are independent conditionally 
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on their common output Y. Therefore 


ExEu\xHx{X[e]) = x{U[e])) = Ex,uHx{X[e]) = x{U[e])) ( 9 ) 

= Ex[e],f/[e]l(x(^[e]) = x(C^[e])) (10) 

= Y.^x[e],u[e]\Y=yHx{X[e]) = xiU[e]mY = y} (11) 
y 

= j;(P{5o|y = y}^ + P{5i|y = y}^)E{Y = y} (12) 
y 

where PlSjlT = y} is the probability that X[e] belongs to x~^ii) given that Y = y, iov i = 0,1. 
Since P{5o|T = y} +P{S'i|y = y} = 1, we have PjS'oly = y}^ +P{S'i|y = y}^ > 1/2, hence 

^(P{5o|y = y}2 + F{Si\Y = y}^)F{Y = y} > 1/2, (13) 

y 

and (8) is upper-bounded by log(2) = 1. □ 


C A sharp n and ^-dependent threshold for CSPs from Gol- 
dreich’s functions 

To prove Lemma 8 for a Goldreich random CSP as in (3) we need only make a slight 
modification to the proof in Section 4.2. First, to put ourselves in the setting where we have a 
product measure we fix the predicate x (we do not need to fix the planted solution as we did 
above). This implies that the threshold we obtain may depend on both n and x- As before, 
let A,/, = {F e {0,1}^; Z{F) < 2'^"'}, and now let F = F{n,p) denote a CSP obtained as 
in (3), with x fixed and denote by the planted solution. The space we are working in is 
{0,1}^, where N = 2(^), and indeed Pp{F) =pl^l(l — 

Lemma 3 can be restated as follows. 

Lemma 10. For a fixed k and > 0, the property A^ has a sharp threshold. 

The only place in which the proof of Lemma 10 differs from the proof of Lemma 8 is in 
the application of Theorem 9, but we give the details for completeness. 

Proof of Theorem 10. Suppose for a contradiction that A^f, has a coarse threshold. Then 
there exist 7, p^y = o(l) and C as in Theorem 7, and so one of the two cases in its conclusion 
must hold. 

Case 1: //p(x G {0,1}"^ : x contains x' G A^ of size \x'\ < IOC) > <5. 

If the size of a formula x' is < IOC, then its clauses involve at most lOCfe variables. Since 
x' G A^, and it is satisfied by , assigning the planted value to the variables appearing in 
x' and arbitrary values to the other variables yields a satisfying assignment. It follows that 
Z[x') > > 2'^" for large enough re, so x' ^ A^. This proves that Case 1 cannot occur. 

Case 2: there exists x' ^ A^j, of size |x'| < IOC such that the conditional probability satisfies 
//p.^(x G Afi\x' (Zx)> X + d. 

Clearly x' is satisfied by Denote by t < lOC/c the number of variables appearing in x'. 
Without loss of generality, assume these variables are ui,..., For a t-tuple v = (uji ,... ,Vifi 
of distinct variables, we write x'(u) to denote the result of relabeling each variable Vj in x' to 
Viy Since A^ has permutation symmetry, it follows that for any t-tuple v, the conditional 
probability satisfies pp{x G .4|x(u) C x) > 7 + (5. We write x* to mean the result of taking x{v) 
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after drawing a uniformly random t-tuple v. In other words, if a random formula F^{n,p-y) is 
drawn, the union F^{n,p^) U x* belongs to Atp with probability at least 7 + d. 

Now, since P 7 Ip=p-y < C* it follows that lim^-^oo ^ ^ < (j Thus, 

for some e we have //p^+£p^(^ 0 ) < 7 + |. Further, (by a standard two-round exposure 
argument) choosing a formula F^(n,p^ + ^P'y) is equivalent to choosing formulae F^{n,p^) 
and F^{n,e'py) for some e' and taking their union. Note that s,e' don’t depend on n, since 
C does not. 

Denote by x* a random copy of x' drawn as above. Then the above tells us that 

F[F\n,p-y)Ux* eA^]>-f + 6 


while 

F[F^{n,py) U F^{n,e'p^) e A^] < 7 + f. 

It follows that for some formula Hq G {0,1}'^ we have 

P[i/o U X* G A^] - F[Ho U F°(n, e'p^) e A^] > ^ (14) 

Clearly, F[q ^ A^f). Let’s say that a t-tuple of distinct variables v = (uq,...,UiJ G 
{ui,... ,VnY is bad if Z{F[o U x{v)) < 2*^”. It follows that at least a | fraction of all (”)t! 
t-tuples are bad. Let T be the set of bad tuples. We need Erdos and Simonovits’ Theorem 9 
again. 

We will ensure that with high probability, adding F^{n,e'p) to F[q implies adding clauses 
Cl,... ,Ct, where each clause Ci forces some variable to be set to its planted value, and the 
set of fc-tuples of variables in the clauses is T-complete. 

Consider drawing t random clauses. Applying Theorem 9 with 7 = | we find some 7 ' 
for which the t fe-clauses are T-complete with probability at least 7 ^ Given that they are 
T-complete, the probability that they are each of the form 7 ;(uq . ■ ■ u^) = 7 ;(u°^ . ■ ■ u?,) By 
the antisymmetry of chi, each such clause forces some variable to take the planted value. 

We claim that adding t such clauses to Hq yields a formula with < satisfying 

assignments. Indeed, suppose we have a satisfying assignment. Then at least one variable, 
say Ci, from each of the Ci must be set to the planted value c?. But (ci,..., ct) is a bad tuple 
so there are fewer than 2 "^” ways to extend these to the remaining variables to get a satisfying 
assignment for Hq. 

With high probability, F{e'pY has (2^ — 1)) —)> 00 clauses. So if we draw 

F^{n,s'py) the probability that the clauses added don’t include t clauses which force a 0 
variable as above is at most about (1 — 7 ' 2 “^*)^ which we can make as small as we 

like as n —7- 00 . In particular, we can assume it is smaller than |. In the event that F^{n, e'pY 
does include these t clauses Ci,... ,Ct, consider a satisfying assignment of Hq U Ci ... C*. 
The probability that it satisfies a randomly chosen A:-clause is (1 — 2“^). Therefore, in this 
case the expected value of Z{Hq U F^{n,e'p^)) is at most F2^"'{1 — < 2'^"' 

with high probability. Applying Markov’s inequality, we can ensure that with probability 
greater than 1 — the formula Hq U F^{n,e'p^) G contradicting (14). This proves Case 
2 cannot occur and completes the proof of the lemma. 

□ 
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